An introduction of the problem domain and a description of the variable(s) you are choosing to analyze (and why!)
Write a summary paragraph of findings that includes the 5 values calculated from your summary information R script
These will likely be calculated using your DPLYR skills, answering questions such as:
Feel free to calculate and report values that you find relevant. Again, remember that the purpose is to think about how these measure of incarceration vary by race.
Who collected the data?
- This data was collected by the Vera Institute of Justice
How was the data collected or generated?
- This data was assembled using data collected by the U.S Department of
Juistice Bureau of Justice Statistics
Why was the data collected?
- The data was collected to shine more light on the causes and
consequences of who all are being sent to prison or jail. Making the
data more revolved around county data makes studies more grounded and
understandable.
How many observations (rows) are in your data? - I primarily used
What, if any, ethical questions or questions of power do you need to consider when working with this data? - Some questions to consider when working with this data is that it can be considered sensitive information. Lots of individuals have been sent to prison/jail and it can be a huge thing in their lives which means analysis on this data should not be taken lightly.
What are possible limitations or problems with this data? (at least
200 words) - One limitation or problem with this data set is that it
does not contain much information about other gender identities. It
contains male and female, but has no information on individuals who do
not identify as those. This could make analysis on other genders and
identities much harder. - Another problem with this data set has to do
with missing values. For example, in the aapi_pop_15to64
there are about 153,811 rows. Out of those about 62,780 are missing.
This means that about 41% of the data in this column are missing, which
could make data analysis much more difficult. - Another problem with
this data set is that it does not contain clear information on age. It
contains columns such as total_pop_15to64 which tells us
how many people are in between the ages of 15 and 64. However, it does
not give us information on a specific age. It also does not give us a
clear idea on the number of individuals who are older than 64 or younger
than 15. - Another problem with this data set is that it does not
contain information about what other_race_prison_pop means.
The documentation talks about how it is other or unknown racial
categories but that does not give much information to work on. This also
heavily limits individuals because if they are not asian or pacific
islander, black, latinx, native american, or white, then they have to be
classified as ‘other’
Chart:
Description and Why:
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
The second chart that you will create and include will show how two different (continuous) variables are related to one another. Again, think carefully about what such a comparison means and what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design:
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
The last chart that you will create and include will show how a variable is distributed geographically. Again, think carefully about what such a comparison means and what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design: